American Journal of Epidemiology — Latest Matching Preprints

1

Mechanism Matters: A Monte Carlo Evaluation of Estimator Validity and Collider Bias in Environmental Mixture Epidemiology

Obeng-Gyasi, E.

2026-05-26 epidemiology 10.64898/2026.05.25.26354044 medRxiv

Top 0.1%

52.9%

Show abstract

Background: Mixture epidemiology deploys sophisticated estimators, Bayesian kernel machine regression with causal mediation analysis (BKMR-CMA), quantile G-computation (QGC), and parametric G-computation, alongside conventional regression. Comparative evaluations have assumed additive, non-mediated data-generating processes, leaving conditions under which estimator choice determines causal validity uncharacterized. Methods: We developed a simulation framework using military-relevant exposure distributions (metals, per- and polyfluoroalkyl substances [PFAS], polychlorinated biphenyls [PCBs]) and allostatic load (AL) across three deployment tiers, with parameters drawn from military occupational health and contamination literature. Four data-generating processes were specified as directed acyclic graphs: direct effects with confounding (M1), full mediation through AL (M2), synergistic AL-exposure interaction (M3), and collider structure (M4). We evaluated ordinary least squares (OLS), QGC, G-computation, and BKMR-CMA on bias, root mean squared error, and 95% confidence interval coverage across 500 Monte Carlo replications at n = 500 and n = 1,000. Results: No estimator dominated across all mechanisms. Under M1, OLS and G-computation produced near-identical modest positive bias; BKMR-CMA achieved lower root mean squared error through kernel shrinkage. Under M2, BKMR-CMA exhibited severe positive bias for AL (mean bias = +0.579 SD units; coverage = 32.8%). Under M3, BKMR-CMA was the only estimator achieving nominal 95% coverage for AL (95.2%), while regression-based approaches fell to 83.6%. Under M4, G-computation produced persistent bias and near-zero coverage for lead, reflecting structural non-identification. Conclusions: Estimator validity is fundamentally mechanism-dependent. Researchers should base estimator choice on explicit causal assumptions about whether AL functions as confounder, mediator, moderator, or collider, particularly in military and occupational cohorts. We provide a mechanism-to-estimator mapping for applied researchers.

2

Bayesian joint modelling of antibody kinetics and test-negative vaccine effectiveness to characterise hybrid immunity across epidemic waves

Benammar, A.

2026-04-27 epidemiology 10.64898/2026.04.25.26351732 medRxiv

Top 0.1%

51.5%

Show abstract

Vaccine effectiveness against symptomatic SARS-CoV-2 infection varies over time and across epidemic waves. This variation can reflect waning immunity, immune escape by emerging variants, exposure heterogeneity, and differences in previous infection history. Test-negative case-control designs are widely used to monitor vaccine effectiveness, while longitudinal serological studies describe antibody trajectories after vaccination and infection. These evidence streams are often analysed separately. This manuscript presents a simulation-based Bayesian joint modelling framework that links individual-level antibody kinetics to test-negative vaccine effectiveness estimates across successive epidemic waves. Hybrid immunity is represented as the combined effect of vaccination and infection history, with latent antibody titres following a boost-and-decay process after each immunising event. A variant-specific titre-protection curve maps latent antibody levels to the risk of symptomatic infection. The framework is intended to illustrate how apparent changes in vaccine effectiveness may be decomposed into components related to waning, immune escape, and exposure heterogeneity. Using fully synthetic data calibrated to plausible vaccination schedules, infection histories, assay variability, and epidemic-wave structures, the model is evaluated in three simulation studies. The simulations illustrate that joint modelling can recover broad features of the assumed titre-protection relationship under idealised conditions and can separate waning from variant-specific shifts when the data-generating process is correctly specified. The results are not presented as validation on real-world surveillance data. Instead, they provide a transparent methodological proof of concept and identify assumptions that would need to be assessed before applying the framework to linked serological and test-negative datasets. Author declarationsThis manuscript reports a methodological simulation study. All individual-level data used in the manuscript are synthetic. No human participants, patient records, biological samples, or identifiable data were used. No ethics approval was required for the analyses presented here. The author declares no competing interests. This study did not receive external funding.

3

Bias from small-count suppression in county-level cancer disparity estimates: a calibrated simulation study

gahan, k.

2026-06-08 epidemiology 10.64898/2026.06.05.26355021 medRxiv

Top 0.1%

38.0%

Show abstract

Abstract Background. Area-level cancer disparities are routinely estimated from public county data in which rates based on small counts (fewer than 16 cases or deaths) are suppressed. Analysts typically drop suppressed counties (complete-case analysis). Because suppression depends on case counts tied to population size and demographic composition, this missingness may be informative, but its effect on the disparity estimate has not, to our knowledge, been quantified. Methods. In a cross-sectional ecological study of 3,143 U.S. counties (analytic sample 3,018 with computable exposure) using one frozen public release of NCI State Cancer Profiles incidence and mortality data and ACS 2018-2022 5-year data, we estimated the most- versus least-deprived ICE(race+income) quintile rate ratio (RR) and rate difference for female breast, stomach, and cervix cancers under four suppression-handling methods: complete-case, available-case, bounding, and model-based small-area estimation. We characterized which counties were erased, and, following the ADEMP framework, ran a Monte Carlo simulation (1,000 replicates per cell; Monte Carlo standard error of bias approximately 0.0025) calibrated to the release to measure bias against a known truth. Analyses were pre-registered. Results. The suppressed fraction rose with rarity: 7.4% of counties for breast, 61.3% for stomach, and 75.7% for cervix incidence. Suppression was concentrated in the most-deprived quintile (cervix, 81.8% suppressed vs 63.8% least-deprived) and overwhelmingly removed rural rather than minority residents (cervix: 81% of the rural but 9% of the minority population erased). For breast (little suppression) the RR was 0.87 (95% CI 0.85-0.89) and identical across methods; for cervix incidence the complete-case RR (1.56) exceeded the model-based estimate (1.50), and for cervix mortality (91% suppressed) complete-case (1.86) exceeded model-based (1.56) by 16% with a wide bounding interval (1.88-2.62). In calibrated simulation, population-weighted complete-case bias was small (less than 2%) at the observed deprivation-county-size correlation and grew with rarity, threshold, and unweighted aggregation; its direction was conditional, becoming positive (over-estimation) as deprived counties became smaller. Conclusions. Complete-case handling of suppressed counties over-estimates rare-cancer area disparities relative to methods that retain them, while silently erasing most of the rural and most-deprived communities the estimate is meant to represent. The effect is negligible for common cancers and grows with rarity. Public-data disparity analyses should report the suppressed fraction and use bounded or model-based estimates by default. Keywords: cancer disparities; small-count suppression; Index of Concentration at the Extremes; informative missingness; small-area estimation; rural health.

4

Direct and mediated effects (DME) SLCMA: a novel method for life course modelling with time-varying covariates

Beer, S.; Simpkin, A. J.; Eldeeb, S. Y.; Zar, H. J.; Stein, D. J.; Dunn, E. C.; Smith, A. D. A. C.

2026-06-06 epidemiology 10.64898/2026.05.29.26354427 medRxiv

Top 0.1%

33.3%

Show abstract

Background: In prospective cohort studies, where an exposure is collected repeatedly, interest often lies in determining whether the timing of that exposure has a differential effect on a later outcome. The Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, provides one way to analyse such longitudinal data. However, few studies using SLCMA consider the effect of time-varying covariates (TVC) which may impact associations. Methods: We present a modified version of the SLCMA - called direct and mediated effects (DME)-SLCMA - which corrects for TVC. We first develop the DME-SLCMA method, test it through simulation, and apply it to psychosocial data from the Drakenstein Child Health Study (DCHS, n=336) to investigate relationships between maternal psychopathology, TVC of socioeconomic status, and offspring depressive symptoms. Results: We found that, on average, offspring depressive symptoms score increased by 3.9% (95% CI: 1.0%-6.9%, p = 0.039) for each unit of maternal psychopathology (SRQ) at 48 months whilst adjusting for time-varying socioeconomic status (at 18, 30, 42 and 54 months). Our simulations identified several realistic scenarios where selections ignoring TVC - with TVC mediated exposure effects present - were prone to be incorrect, including our DCHS example. Conclusion: DME-SLCMA is a robust new approach for life course modelling in the presence of time-varying covariates. We recommend adjusting for TVC whenever possible, and, when not possible, our simulation study identified that scenarios where mediated effects are comparable, or greater, in magnitude to direct effects are most prone to confounding.

5

Estimating COVID-19 Cumulative Incidence from Seroprevalence Surveys accounting for Time-Varying Seroreversion: A Fully Bayesian Methodology

Owusu-Boaitey, N.; Meyer, M. J.; Herrera-Esposito, D.; Bottcher, L.; Lukz, M.; Cook, S.; Stoto, M. A.; Kraemer, J. D.

2026-06-10 epidemiology 10.64898/2026.06.09.26355264 medRxiv

Top 0.1%

33.0%

Show abstract

Seroprevalence surveys reveal the extent of humoral immunity against pathogens such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and under some circumstances represent cumulative incidence of prior infection. However, antibody waning - or seroreversion - biases these estimates by reducing assay sensitivity in a time-varying manner. Because assay sensitivity decays over time, naively using serosurveys can substantially bias estimates of SARS-CoV-2 cumulative incidence and fatality rates. The Bayesian assay-specific, time-varying sensitivity adjustment developed in this paper can reliably correct for this bias and account for the delay between infection and serosurvey. In seroprevalence studies conducted in the United States in 2020, adjusting for time-varying sensitivity increased cumulative incidence by up to 1.4-fold, with an adjustment of 1.08 for a national study. Our estimates contrast with a previously published 2-fold adjustment that did not account for assay design. This suggests that previous analyses overestimated cumulative incidence by applying seroreversion corrections that did not account for assay-specific effects, or underestimated cumulative incidence by not applying seroreversion corrections. These biases imply fatality rate underestimation and overestimation, respectively. Our model provides a framework for design-specific time-varying sensitivity corrections in seroprevalence surveys for other pathogens.

6

HHBayes: A Flexible Bayesian Framework for Simulating and Analyzing Household Transmission Dynamics

Li, K.; Hou, Y.; Mukherjee, B.; Pitzer, V. E.; Weinberger, D. M.

2026-04-03 infectious diseases 10.64898/2026.04.01.26349903 medRxiv

Top 0.1%

28.9%

Show abstract

Household transmission studies are important for understanding infectious disease transmission and evaluating interventions; however, they are frequently constrained by methodological challenges, including in study design and sample size determination, and in estimating parameters of interest after collecting the data. Existing tools often lack flexibility in modeling age-specific susceptibility, infectivity patterns, and the impact of interventions such as vaccination or prophylaxis. Here, we develop HHBayes, an open-source R package that provides a unified framework for simulating and analyzing household transmission data using Bayesian methods. The package enables researchers to: (1) simulate realistic household transmission dynamics with highly customizable variables; (2) incorporate viral load data (measured in viral copies/mL or cycle threshold values) to model time-varying infectiousness; (3) estimate age-dependent susceptibility and infectivity parameters using Hamiltonian Monte Carlo methods implemented in Stan; and (4) evaluate intervention effects through user-defined covariates that modify susceptibility or infectivity. We demonstrate the capabilities of the package through simulation studies showing accurate parameter recovery and applications to seasonal respiratory virus transmission, including the impact of vaccination and antiviral prophylaxis on household attack rates. HHBayes addresses a critical gap in infectious disease epidemiology by providing researchers with accessible tools for both prospective study design and retrospective data analysis. The flexibility of the package in handling complex household structures, time-varying infectiousness, and intervention effects makes it valuable for studying diverse pathogens.

7

Changing COVID-19 vaccine eligibility could reshape disease burden for all

Larsen, S. L.; Martinez, P. P.; Mahmud, A.

2026-04-29 epidemiology 10.64898/2026.04.27.26351870 medRxiv

Top 0.1%

23.2%

Show abstract

COVID-19 vaccine recommendations are evolving in the United States. While older adults are most at risk of severe COVID-19 outcomes and therefore experience the greatest direct benefits of vaccination, limiting vaccination to only this age group could worsen outcomes in this higher-risk population. Here, we leveraged data from a statewide survey in Illinois to inform transmission models accounting for contact and vaccination rates across age. Simulating a single season of COVID-19 transmission, we compared deaths under existing vaccination coverage against counterfactual scenarios where individuals under 5 or under 65 were never vaccinated. We find substantial indirect vaccine impacts for older adults. Our results suggest that existing vaccination coverage among younger people is mitigating COVID-19 mortality for older populations. These findings can provide insights into the long-term consequences of deprioritizing young adults and children from vaccination campaigns, and suggest that a lack of vaccine-induced immunity may impact outcomes in other age groups. This underscores the importance of considering indirect vaccine impacts when developing policy.

8

Incorporating Uncertainty in Study Participants' Age in Serocatalytic Models

Chen, J.; Lambe, T.; Kamau, E.; Donnelly, C.; Lambert, B.; Bajaj, S.

2026-03-16 infectious diseases 10.64898/2026.03.14.26346885 medRxiv

Top 0.1%

23.1%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWSerological surveys measure the presence of antibodies in a population to infer past exposure to an infectious pathogen. If study participants ages are known, serocatalytic models can be used to retrace the historical transmission strength of a pathogen within that population, quantified by the force of infection (FOI). These models rely on age information as a key variable since infection risks are interpreted in relation to how long individuals have been at risk. However, due to data constraints, participants ages may be provided only within "age bins". A common approach is then to assign individuals ages to be midpoints of their respective age bins, ignoring uncertainty in this quantity. In this study, we quantify the bias introduced by this midpoint approach and develop a Bayesian framework that explicitly accounts for uncertainty in age. By comparing inference under constant, age-dependent, and time-dependent FOI scenarios, we show that incorporating uncertainty in age in serocatalytic models yields more reliable FOI estimates without sacrificing computational complexity. These improvements support the interpretation of serological data and inform public health decisions, such as estimating disease burden and identifying targeted vaccination groups.

9

Physical activity and body mass index inequities among adult women in the United States: An application of intersectional multilevel analysis of individual heterogeneity and discriminatory accuracy (I-MAIHDA)

Echeverria, S.; Seo, Y.; Borrell, L. N.; McKelvey, D.; Najjar, T.; Reifsteck, E. J.; Erausquin, J. T.; Maher, J. P.

2026-04-07 epidemiology 10.64898/2026.04.06.26350273 medRxiv

Top 0.1%

22.4%

Show abstract

Background Physical activity (PA) and body mass index (BMI) shape cardiovascular risk, particularly in women. Yet, little research exists examining intersectional social axes shaping PA and BMI inequities among women living in the United States (US). Methods Data included women sampled in the 2015-2020 National Health and Nutrition Examination Survey. We used Intersectional Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (I-MAIHDA) via linear models to examine PA (n=,4591) and BMI (n=4,596) inequities across intersectional strata defined by race/ethnicity, age, education, nativity, and work status. We further quantified the contribution of these strata to the observed inequities and estimated additive fixed effects. Results In the null model, intersectional strata explained 4.6% and 13.8% of the variance in PA and BMI inequities, respectively, with 99.2% for PA and 97.5% for BMI explained by age, race/ethnicity, education, nativity, and occupation status. On average, Asian and Black women, those aged 35-49 years, those born outside the US, and those with less than a high school diploma had the lowest predicted mean PA. For BMI, Black and Hispanic/Latino women and those younger than 64 years had the highest mean BMI. Conclusion PA and BMI inequities are mostly explained by race/ethnicity, age, education, nativity, and work status. Our findings offer insights into universal and potential policy-informed health promotion strategies that may be tailored to women with these social identities and lived experiences that have shaped physical activity and body mass index inequities.

10

The pitfalls of incidence-based time series regression for inferring the effects of weather on infectious diseases

Gemo, P.; Barrero Guevara, L. A.; Kussmaul, C.; Kramer, S. C.; Domenech de Celles, M.

2026-03-15 epidemiology 10.64898/2026.03.13.26348326 medRxiv

Top 0.1%

22.2%

Show abstract

1A central question in environmental epidemiology is how the weather affects infectious diseases. Time-series regression (TSR) on population-level case incidence data is widely used to estimate weather effects; however, this design may be biased due to the complexities of infectious disease dynamics, including nonlinear feedback, various types of noise, and latent, dynamic variables such as population immunity. Here, we assess the reliability of incidence-based TSR through a controlled simulation study across four different climates and fifty scenarios representing different pathogens. For each scenario, we simulated 10 years of weekly incidence data using a simple transmission model that included real-world weather data on temperature and relative humidity. We then examined whether the ground-truth weather effects could be recovered from model simulations using negative binomial generalized additive models, a flexible class of TSR models commonly used in empirical applications. We find that these models frequently fail to yield accurate and precise estimates of weather effects, even under favorable conditions such as no process noise and low observation noise (overdispersion). Hence, our results caution against the indiscriminate use of TSR models and suggest that more mechanistic approaches are needed for statistical inference of weather effects from population data.

11

Disentangling infectiousness and susceptibility by age group using transmission pair data: a study of SARS-CoV-2 household transmission

Leung, K. Y.; Miura, F.; Backer, J. A.

2026-06-05 epidemiology 10.64898/2026.06.04.26354892 medRxiv

Top 0.1%

21.5%

Show abstract

Background Differential contributions to transmission across age groups have been reported for many respiratory infections, including SARS-CoV-2. They are crucial for estimating the impact of age-specific interventions. Disentangling these age-dependent contributions remains challenging, as they may reflect differences in contact rates, biological susceptibility, or infectiousness. Aim We aim to jointly estimate age-specific per-contact infectiousness and susceptibility and their effect on the impact of age-specific interventions. Methods The age-specific infectiousness and susceptibility were jointly estimated in a Bayesian framework by combining contact data with transmission pair data (who-infected-whom). We applied this approach to 197,840 self-reported household transmission pairs collected in the Netherlands during the COVID-19 pandemic. Using these estimates, we projected the expected impact of school closure and work-from-home measures during the early stages of an epidemic in the absence of other interventions. Results Both infectiousness and susceptibility to SARS-CoV-2 infection were lowest in children aged 0-9 years and highest in adults over 30 years old, with 2- to 4.5-fold differences between these groups. Projected impacts of age-specific interventions indicated that school closures would reduce the reproduction number by 8% or 29% when age-specific susceptibility and infectiousness were or were not considered, respectively. Conversely, working-from-home policies would lead to reductions of 41% with and 20% without age-specific infectiousness and susceptibility. Conclusion Our method enables robust estimation of age-specific infectiousness and susceptibility. Accounting for these age heterogeneities is essential for projecting the impact of age-targeted interventions. Our approach is adaptable to other respiratory infections and can guide more tailored public health responses.

12

Methodological Considerations in Sibling Analyses of Prenatal Acetaminophen

Ahlqvist, V. H.; Sjoqvist, H.; Gardner, R. M.; Lee, B. K.

2026-03-30 epidemiology 10.64898/2026.03.27.26349515 medRxiv

Top 0.1%

18.9%

Show abstract

Background: Sibling-matched designs control for shared familial confounding but remain vulnerable to non-shared confounders. Bi-directional sensitivity analyses, which stratify families by whether the older or younger sibling was exposed, are commonly used to assess carryover effects. We aimed to demonstrate how this methodological approach can introduce severe confounding by parity. Methods: We conducted simulations motivated by a recent epidemiological study. The true causal effect of a hypothetical exposure (prenatal acetaminophen) on neurodevelopmental outcomes was set to strictly null. To introduce parity-related confounding, baseline exposure and outcome probabilities were varied slightly by birth order. We compared conditional logistic regression effect estimates from total sibling models against bi-directional stratified models. Results: In the total simulated sibling cohort, models yielded the true null effect (odds ratio = 1.00) when adjusting for parity. However, the bi-directional analyses exhibited divergent artifactual signals. Because parity is perfectly collinear with exposure in these stratified subsets, it cannot be adjusted for. For example, when the older sibling was exposed, the odds ratio for autism spectrum disorder was 1.68; when the younger was exposed, the odds ratio was 0.60. Conclusions: Divergent estimates in bi-directional sibling analyses can be a predictable artifact of parity confounding rather than evidence of carryover effects or invalidating unmeasured bias. Overall sibling models adjusting for parity may remain robust despite divergent stratified sensitivity results.

13

Sexual risk behaviours following medical male circumcision: a matched pseudo-cohort analysis using population-based survey data

Mwakazanga, D. K.; daka, v.; Gwasupika, J. K.; Dombola, A. K.; Kapungu, K. K.; Khondowe, S.; Chongwe, G. K.; Fwemba, I.; Ogundimu, E.

2026-04-13 epidemiology 10.64898/2026.04.11.26350676 medRxiv

Top 0.1%

15.0%

Show abstract

Medical male circumcision (MMC) is an established HIV prevention intervention, yet concerns persist that circumcised men may adopt higher-risk sexual behaviours following the procedure. Evidence from observational studies has been inconsistent, partly because many analyses do not adequately distinguish behaviours that occur before circumcision from those that occur afterward. This study assessed the association between MMC and subsequent sexual behaviours while demonstrating how population-based cross-sectional survey data can be adapted to address this temporal challenge. We analysed nationally representative data from the 2024 Zambia Demographic and Health Survey (ZDHS), including men aged 15-59 years who reported their circumcision status. Men who had undergone medical circumcision were compared with uncircumcised men using a matched pseudo-cohort framework that reconstructed temporal ordering based on age at circumcision. Propensity score overlap weighting was applied to improve comparability between circumcised and uncircumcised men, and odds ratios were estimated using logistic regression models incorporating overlap weights and accounting for the complex survey design. Sexual behaviour outcomes occurring after circumcision included condom non-use at last sexual intercourse, multiple sexual partners in the past 12 months, self-reported sexually transmitted infection (STI) symptoms, and composite measures of sexual risk behaviour. The analysis included 9,609 men, of whom 33.3% were medically circumcised. MMC was associated with lower odds of condom non-use at last sexual intercourse (adjusted odds ratio [aOR] = 0.75, 95% confidence interval [CI]: 0.67-0.85) and lower odds of reporting any sexual risk behaviour (aOR = 0.83, 95% CI: 0.72-0.95). No meaningful associations were observed between MMC and reporting multiple sexual partners, self-reported STI symptoms, or higher levels of composite sexual risk behaviour. In this population-based study, MMC was not associated with sexual risk compensation under routine programme conditions within the overlap population defined by the weighting scheme, supporting the behavioural safety of MMC and illustrating the value of explicitly addressing temporality when analysing behavioural outcomes using cross-sectional survey data.

14

Operationalizing the neural exposome for brain health and Alzheimer's Disease and Related Dementias (AD/ADRD) vulnerability in rural settings: pilot study

Souza-Talarico, J. N.; Lehmler, H.-J.; Caldwell, J. K.; Cortes, Y.; Zuelsdorff, M.; Fun, Y.; Embree, J.; Doyle, C.; Halverson, K.; Martinez Rangel, M.; Harb, A.; Croskey, O.; Britt, K.; Howland, C.; Capuano, A. W.

2026-06-01 public and global health 10.64898/2026.05.21.26353825 medRxiv

Top 0.1%

14.7%

Show abstract

INTRODUCTION: Alzheimers disease and related dementias (AD/ADRD) arise from cumulative environmental, social, behavioral, and biological influences across the life course. The neural exposome framework conceptualizes how exogenous, behavioral, and endogenous factors interact to shape brain health; however, its application to preclinical AD/ADRD research, particularly in rural populations, remains limited. METHODS: We developed and piloted a community-embedded, decentralized research model to operationalize the neural exposome framework among cognitively unimpaired adults aged 45+ in two rural Midwestern U.S. communities, integrating environmental, social, behavioral, geospatial, and biological measures to evaluate exposure-related neurobiological and cognitive vulnerability. RESULTS: This approach demonstrated high feasibility and acceptability, achieving strong recruitment, retention, data completeness, and multidomain biomarker collection in rural community-based settings DISCUSSION: Pilot findings support the feasibility of neural exposome-informed research in rural U.S. communities and highlight its potential to advance prevention-oriented research on brain health and AD/ADRD.

15

Mapping the Dynamic Interplay of Mental Health and Weight Across Childhood: Data-Driven Explorations Using Causal Discovery

Larsen, T. E.; Lorca, M. H.; Ekstrom, C. T.; Vinding, R.; Bonnelykke, K.; Strandberg-Larsen, K.; Petersen, A. H.

2026-04-17 epidemiology 10.64898/2026.04.16.26350943 medRxiv

Top 0.1%

14.3%

Show abstract

Childhood weight development, especially overweight and obesity, has been associated with mental health, but their dynamic, causal relationships, and whether these differ by sex, remain unclear. We applied causal discovery to data from the Danish National Birth Cohort (n=67,593) spanning six periods from pregnancy to late adolescence and considering 67 variables related to child and parental weight, mental health, lifestyle, and socio-economic factors. We found no statistically significant difference between the causal graphs for boys and girls (P=0.079). The data-driven models found causal influence of childhood weight on subsequent weight status. Mental health pathways were exclusively within or across adjacent periods and centered on early adolescent stress. We examined the interplay between a subset of mental health variables, containing information on externalizing and internalizing problems, and weight, and found no direct causal pathway between the two processes. These findings suggest that observed links between weight and these mental health measures may be attributable to confounding. Our findings demonstrate the value of data-driven causal discovery in large cohort studies and how to test for differences in causal mechanisms across subgroups. Results are available in an interactive application, enabling future research to further explore the interplay between weight and mental health.

16

A New Mixed Frequency Regression Model For Environmental Epidemiology

Shukla, N.; Bartington, S. E.; Hansell, A. L.; Lucas, T. C.

2026-06-04 epidemiology 10.64898/2026.06.03.26354801 medRxiv

Top 0.1%

12.4%

Show abstract

Background: In the absence of high-resolution response data, exposure-response modelling often relies on aggregated low-frequency exposure data, leading to loss of high-resolution information. Mixed Data Sampling (MIDAS) from econometrics offers an alternative but is limited due to its inability to make high-resolution predictions, inflexible likelihoods and penalised nonlinear functions, and limited visualization options. We propose a mixed-frequency Distributed Lag Non-linear Model (mf-DLNM) which can eliminate the need to aggregate exposure data in environmental epidemiology and provide high resolution predictions for time series studies. Methods: We evaluated the inference and predictive performance of the mf-DLNM. To evaluate its ability to estimate exposure-response relationships, we applied mf-DLNM and same-frequency (sf)-DLNM using data from the West Midlands, UK. Additionally, we compared the predictive performance of mf-DLNM with sf-DLNM and MIDAS across nine regions of England. As MIDAS cannot predict at the resolution of the predictor (daily), we compared the predictive performance of mf-DLNM and MIDAS at weekly resolution. To test the model's ability to predict high temporal resolution risk (daily), we compared sf-DLNM (with access to daily mortality counts) with mf-DLNM (with access only to weekly mortality counts). Results: In the West Midlands example, mf-DLNM performed comparably to sf-DLNM in estimating daily risk of temperature on respiratory mortality. Furthermore, mf-DLNM and MIDAS exhibited similar performance for weekly predictions. For high-resolution predictions, mf-DLNM and sf-DLNM showed nearly similar performance, despite mf-DLNM having access only to low-resolution response data. Conclusion: This mixed-frequency approach in environmental epidemiology overcomes the limitations of predicting health risks using aggregated exposure data and provides estimates of high-resolution outcomes in the absence of high-frequency health outcome datasets.

17

Causal estimands and target trials for the effect of lag time to treatment of cancer patients

Goncalves, B. P.; Franco, E. L.

2026-04-08 epidemiology 10.64898/2026.04.07.26350338 medRxiv

Top 0.1%

12.2%

Show abstract

Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.

18

The joint effects of exposure to prenatal pesticides and psychosocial factors on epigenetic age acceleration in the first 5 years of life in a South African birth cohort.

Abrishamcar, S.; Eick, S. M.; Everson, T.; Suglia, S. F.; Fallin, M. D.; Wright, R. O.; Andra, S. S.; Chovatiya, J.; Jagani, R.; Barr, D. B.; Lussier, A. A.; Dunn, E. C.; MacIsaac, J. L.; Dever, K.; Kobor, M. S.; Hoffman, N.; Koen, N.; Zar, H. J.; Stein, D. J.; Hüls, A.

2026-04-05 epidemiology 10.64898/2026.04.03.26350118 medRxiv

Top 0.1%

10.7%

Show abstract

Background Prenatal exposure to pesticides and psychosocial factors often co-occurs, particularly in low- and middle-income settings, yet their joint effects on epigenetic age acceleration (EAA) in early life remain unknown. We investigated the joint associations of prenatal pesticides metabolites and psychosocial factors on EAA in the first five years of life in the South African Drakenstein Child Health Study. Methods In 643 mothers, we measured 11 urinary pesticide metabolites and seven psychosocial factors during the second trimester of pregnancy. Child DNA methylation was measured in whole blood at ages 1, 3, and 5 years. EAA was estimated using the Horvath, Skin & Blood Horvath (skinHorvath), and Wu epigenetic clocks. Longitudinal associations were estimated using generalized estimating equations, adjusted for confounders. Joint mixture associations were evaluated using weighted quantile sum regression (WQS) and quantile g-computation (QGCOMP). Results The joint prenatal exposure mixture was positively associated with Wu ({beta} per one quintile increase in the mixture [95% CI]: 0.41 years [0.15, 0.80]), skinHorvath (0.11 years [0.06, 0.16]), and Horvath EAA (0.31 years [0.20, 0.46]) over time using WQS. Psychosocial factors, particularly food insecurity, physical interpersonal violence, and stress biomarkers, contributed most to the total mixture effect for all clocks. Pyrethroid metabolites PBA and TDCCA were top pesticide contributors to Wu EAA. Pathway enrichment analyses of clock-specific CpGs revealed distinct biological architectures, with the Wu clock enriched for neurodevelopmental and immune pathways, and metabolic pathways for the Horvath clock. Discussion Joint prenatal exposure to pesticides and psychosocial factors was associated with increased EAA across early childhood, with psychosocial factors contributing the most to the total effect. These findings highlight the importance of assessing chemical and non-chemical stressors jointly and clock-specific biological interpretation in epigenetic aging research.

19

GPS Mobility Tracking, Ecological Momentary Assessment, and Qualitative Interviewing to Specify How Space Produces Intersectional Health Inequities: Development and Pilot Testing of the Spatial Intersectionality Health Framework (SIHF) and IGEMA Methodology

Cook, S.; Pettus, B.

2026-04-28 epidemiology 10.64898/2026.04.09.26350546 medRxiv

Top 0.1%

10.5%

Show abstract

BackgroundYoung sexual and gender minorities of color face compound health risks shaped by interlocking systems of racism, cisgenderism, and class inequality. Spatial health research documents that place shapes health, but existing methods cannot specify the mechanisms through which spatial configurations produce different health outcomes for differently positioned people. This gap prevents targeted intervention. ObjectiveTo develop and pilot test the Spatial Intersectionality Health Framework (SIHF), which specifies three mechanisms through which space produces intersectional health inequities: Layered (multiple oppressive systems activating simultaneously), Positional (the same space producing different health pathways by intersectional position), and Conditional (nominally protective spaces carrying hidden costs for specific positions). We also introduce and validate Intersectional Geographically-Explicit Ecological Momentary Assessment (IGEMA) as the methodology operationalizing SIHF across three data levels. MethodsThe GeoSense study enrolled 32 young sexual and gender minorities of color (ages 18-29) in New York City. IGEMA was implemented across three integrated levels: (1) GPS mobility tracking via participants personal smartphones, linked to census tract structural exposure indices across n=19 participants; (2) ecological momentary assessment of intersectional discrimination with multilevel modeling of mood, stress, and sleep outcomes; and (3) map-guided qualitative interviews with SIHF mechanism coding and intercoder reliability assessment across 92 coded records from 18 participants. This study was conducted as the pilot for NIH R01HL169503. ResultsAll three SIHF mechanisms were empirically detectable. A compound structural gendered racism index outperformed every single-axis alternative in predicting daily mood (b=-0.048, p=.001) and stress (b=0.121, p<.001). The Positional mechanism accounted for 71% of coded harm experiences. Intercoder reliability for mechanism assignment reached kappa=0.824 at Stage 2 reconciliation. Daily intersectional discrimination predicted greater sleep disturbance (b=1.308, p=.004). ConclusionsSIHF and IGEMA together provide an empirically testable framework for specifying how space produces intersectional health inequities. Mechanism specification, not spatial location alone, is the condition for designing research and intervention that reaches the source of harm for multiply marginalized populations.

20

Change for life? Adolescent cognitive development predicts mortality risk independent of childhood ability

Walhovd, K. B.; Berg, A. I.; Buratti, S.; Buren, J.; Bjalkebring, P.; Fischer, M.; Hansson, I.; Hassing, L.; Jonsson, A.-C.; Jonsson, L.; Lindwall, M.; Nilsson, T.; Rogeberg, O.; Segerberg, A.; Thorvaldsson, V.; Landen, M.; Klapp, A.; Lovden, M.

2026-06-01 public and global health 10.64898/2026.05.23.26353598 medRxiv

Top 0.1%

9.2%

Show abstract

Lower cognitive ability measured in childhood or late adolescence has been consistently associated with higher mortality risk across adulthood. However, this evidence largely relies on single assessments, leaving it unclear to what extent mortality risk reflects cognitive differences established early in life versus developmental divergence during adolescence - a period of substantial neurocognitive plasticity. Using two nationally representative Swedish cohorts comprising 9,412 males born in 1948 and 1953, we linked cognitive ability assessed in primary school at age 13 years and military conscription at age 18 years to all-cause and cause-specific mortality recorded in nationwide registers through 2025. We decomposed late-adolescent cognitive ability into childhood cognitive level and adolescent cognitive change and evaluated their independent associations with mortality. Childhood cognitive level (HR = 0.81; 95% CI, 0.78-0.85) and adolescent cognitive change (HR = 0.84; 95% CI, 0.79-0.89) independently predicted lower mortality risk, also after adjustment for parental education. Childhood cognitive level and adolescent cognitive change showed partially distinct cause-specific patterns. Childhood cognitive level was most strongly associated with mortality from intrinsic causes, whereas adolescent cognitive change showed relatively stronger associations with external causes, particularly accidental deaths. Although adolescent cognitive change was associated with psychosocial factors including education and psychiatric diagnosis at conscription, its association with mortality persisted after adjustment for these factors. These findings suggest that cognitive development during adolescence carries independent prognostic information regarding long-term survival beyond cognitive level established by late childhood, highlighting adolescence as a consequential period for lifelong health.